feat: add SKIPPED async job status to skip jobs with no data to return#980
feat: add SKIPPED async job status to skip jobs with no data to return#980devin-ai-integration[bot] wants to merge 2 commits intomainfrom
Conversation
Add a new terminal SKIPPED status to AsyncJobStatus that allows connectors to indicate a job should be silently skipped (no records fetched, no retries, no errors). This is useful for APIs like Amazon SP-API where CANCELLED means 'no data to return' and retrying is wasteful. Changes: - Add SKIPPED to AsyncJobStatus enum (terminal status) - Add _process_skipped_partition to AsyncJobOrchestrator (frees job budget, no yield) - Add SKIPPED handling in partition status aggregation - Add optional 'skipped' key to AsyncJobStatusMap schema (backward-compatible) - Add 'skipped' mapping in model_to_component_factory - Handle None values in _create_async_job_status_mapping for optional fields - Add unit tests for SKIPPED partition status and orchestrator behavior Co-Authored-By: bot_apk <apk@cognition.ai>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1775485391-add-skipped-async-job-status#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1775485391-add-skipped-async-job-statusPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
PyTest Results (Fast)4 018 tests +6 4 007 ✅ +6 7m 48s ⏱️ +9s Results for commit eaf016a. ± Comparison against base commit 5b14f41. This pull request removes 1 and adds 7 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
The generic None check could silently skip a required field that was accidentally None. Now only fields in _OPTIONAL_ASYNC_STATUS_FIELDS (currently just 'skipped') are allowed to be None. Required fields (running, completed, failed, timeout) raise ValueError if None. Co-Authored-By: bot_apk <apk@cognition.ai>
| # This is an element of the dict because of the typing of the CDK but it is not a CDK status | ||
| continue | ||
|
|
||
| if api_statuses is None: |
There was a problem hiding this comment.
Good catch! Fixed in eaf016a — the None guard is now scoped to only the optional skipped field via _OPTIONAL_ASYNC_STATUS_FIELDS = {"skipped"}. If a required field (running, completed, failed, timeout) is ever accidentally None, it will now raise a ValueError instead of being silently skipped.
PyTest Results (Full)4 021 tests +6 4 009 ✅ +6 11m 11s ⏱️ -1s Results for commit eaf016a. ± Comparison against base commit 5b14f41. This pull request removes 1 and adds 7 tests. Note that renamed tests count towards both. |
|
/prerelease
|
Summary
Adds a new terminal
SKIPPEDstatus to the CDK's async job framework. When an API indicates a job has no data to return (e.g., Amazon SP-API'sCANCELLEDstatus), connectors can now map it toskippedinstead of forcing it intofailedortimeout— both of which trigger wasteful retries (up to 3x).Behavior: SKIPPED partitions free their job budget and are silently dropped — no records are fetched, no retries are attempted, no errors are raised.
Backward-compatible: The
skippedkey inAsyncJobStatusMapis optional (Optional[List[str]] = None). Existing connector manifests that don't declare it continue to work unchanged.Files changed:
status.py— newSKIPPEDenum member (terminal)job_orchestrator.py— partition status aggregation,_process_skipped_partition, match casedeclarative_component_schema.yaml— optionalskippedproperty onAsyncJobStatusMapdeclarative_component_schema.py— Pydantic model fieldmodel_to_component_factory.py— status mapping + scopedNone-guard that only allowsNonefor optional fields (skipped); required fields raiseValueErrorifNonetest_job_orchestrator.py— 6 new tests (partition status combos + orchestrator behavior)Review & Testing Checklist for Human
declarative_component_schema.pyappears to be auto-generated from the YAML schema. Confirm whether this manual edit will survive the next codegen run, or if the YAML change alone is sufficient. Run/poe buildto regenerate and verify.AsyncPartition.status): Verify the ordering of checks is correct — particularly that{COMPLETED, SKIPPED}resolves toCOMPLETED(not SKIPPED), and thatSKIPPEDdoesn't shadow theFAILED/TIMED_OUTwildcard retry paths. The subset checkstatuses <= {AsyncJobStatus.COMPLETED, AsyncJobStatus.SKIPPED}is the key line._create_async_job_status_mapping: The guard only allowsNonefor fields in_OPTIONAL_ASYNC_STATUS_FIELDS(currently{"skipped"}). Required fields (running,completed,failed,timeout) raiseValueErrorifNone. This is not directly unit-tested — consider adding a test.AsyncJobStatusvalues (e.g.,test_given_one_failed_job_when_status_then_return_failedcreates one job per enum value): These now include a SKIPPED job. Verify the priority expectations still hold with the extra status in the mix.Suggested test plan: Use the
/testslash command to run connector tests with this CDK branch. Then manually verify with a connector that uses async jobs (e.g., source-amazon-seller-partner) by pinning the CDK to this branch and confirming SKIPPED-mapped statuses produce no records and no retries.Notes
CANCELLEDmeans "no data to return" but the CDK currently forces it intotimeout/failed, causing 3 wasteful retries that burn rate-limit budget and can cascade intoFATALerrors.timeout: [CANCELLED]toskipped: [CANCELLED].Link to Devin session: https://app.devin.ai/sessions/45ba0c82b9654fbb8ce037502492bd3d